A general-purpose method for supervised learning under covariate shift with applications to observational cosmology

Roberto Trotta (SISSA, Italy)

14-Jun-2022, 16:00-17:00 (4 years ago)

Abstract: Supervised machine learning will be central in the analysis of upcoming large-scale sky surveys. However, selection bias for astronomical objects yields labelled training data that are not representative of the unlabelled target data distribution. This affects the predictive performance with unreliable target predictions and poor generalization. I will present StratLearn, a novel and statistically principled method to improve supervised learning under such covariate shift conditions, based on propensity score stratification. In StratLearn, learners are trained on subgroups ("strata") of the data conditional on the propensity scores, leading to improved covariate balance and much-reduced bias in the model fit. This general-purpose method has promising applications in observational cosmology, improving upon existing conditional density estimation of galaxy redshift from Sloan Data Sky Survey (SDSS) data; in the classification of Supernovae (SNe) type Ia from photometric data, it obtains the best reported AUC on the SNe photometric classification challenge. If time allows, I'll discuss the embedding of such a classification into a full analysis of SNe data to estimate cosmological parameters.

other computer sciencespace physics and aeronomydata analysis, statistics and probability

Audience: researchers in the topic


IAU-IAA Astrostats & Astroinfo seminar(archived version by January 2023)

Series comments: This is an archived version of the seminar with information about talks by January 2023.

Use the following link for the new version: sites.google.com/view/iau-iaaseminar-new

==============

Joint IAU-IAA Astrostats & Astroinfo seminar series focuses on statistical and computational methodological challenges arising in the various fields of astronomy. It discusses existing and new advanced approaches in statistical analysis and data mining of astronomical data.

In the 21st century, increasing resources are devoted to wide-field astronomical surveys, multi-dimensional data, and high-throughput instruments that produce peta-scale datasets and giga-scale samples. In addition to the growing tasks of data storage and management, new statistical tools have been developed or specified for astronomical problems. Astronomical insights require characterizing structure in images, spectra or time series by using non-linear, often high-dimensional models.

This international online seminar series is an initiative of the International Astrostatistics Association and the IAU Astroinformatics and Astrostatistics Commission.

Curators: Stefano Andreon, Fabio Castagna, Andriy Olenko*, Tsutomu T. TAKEUCHI
*contact for this listing

Export talk to